A Fast Greedy Algorithm for Generalized Column Subset Selection

نویسندگان

  • Ahmed K. Farahat
  • Ali Ghodsi
  • Mohamed S. Kamel
چکیده

This paper defines a generalized column subset selection problem which is concerned with the selection of a few columns from a source matrix A that best approximate the span of a target matrix B. The paper then proposes a fast greedy algorithm for solving this problem and draws connections to different problems that can be efficiently solved using the proposed algorithm. 1 Generalized Column Subset Selection The Column Subset Selection (CSS) problem can be generally defined as the selection of a few columns from a data matrix that best approximate its span [2–5,10,15]. We extend this definition to the generalized problem of selecting a few columns from a source matrix to approximate the span of a target matrix. The generalized CSS problem can be formally defined as follows: Problem 1 (Generalized Column Subset Selection) Given a source matrix A ∈ R, a target matrix B ∈ R and an integer l, find a subset of columns L from A such that |L| = l and L = argminS ‖B − P B‖F , where S is the set of the indices of the candidate columns from A, P (S) ∈ R is a projection matrix which projects the columns of B onto the span of the set S of columns, and L is the set of the indices of the selected columns from A. The CSS criterion F (S) = ‖B−P B‖F represents the sum of squared errors between the target matrix B and its rank-l approximation P B . In other words, it calculates the Frobenius norm of the residual matrix F = B − P B. Other types of matrix norms can also be used to quantify the reconstruction error [2, 3]. The present work, however, focuses on developing algorithms that minimize the Frobenius norm of the residual matrix. The projection matrix P (S) can be calculated as P (S) = A:S ( AT:SA:S −1 AT:S , where A:S is the sub-matrix of A which consists of the columns corresponding to S. It should be noted that if S is known, the term ( AT:SA:S −1 AT:SB is the closedform solution of least-squares problem T ∗ = argminT ‖B −A:ST ‖ 2 F . 2 A Fast Greedy Algorithm for Generalized CSS Problem 1 is a combinatorial optimization problem whose optimal solution can be obtained in O (

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Greedy Representative Selection for Unsupervised Data Analysis

In recent years, the advance of information and communication technologies has allowed the storage and transfer of massive amounts of data. The availability of this overwhelming amount of data stimulates a growing need to develop fast and accurate algorithms to discover useful information hidden in the data. This need is even more acute for unsupervised data, which lacks information about the c...

متن کامل

Greedy Column Subset Selection: New Bounds and Distributed Algorithms

The problem of column subset selection has recently attracted a large body of research, with feature selection serving as one obvious and important application. Among the techniques that have been applied to solve this problem, the greedy algorithm has been shown to be quite effective in practice. However, theoretical guarantees on its performance have not been explored thoroughly, especially i...

متن کامل

Fast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets

Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...

متن کامل

A novel greedy algorithm for Nyström approximation

The Nyström method is an efficient technique for obtaining a low-rank approximation of a large kernel matrix based on a subset of its columns. The quality of the Nyström approximation highly depends on the subset of columns used, which are usually selected using random sampling. This paper presents a novel recursive algorithm for calculating the Nyström approximation, and an effective greedy cr...

متن کامل

On Subset Selection with General Cost Constraints

This paper considers the subset selection problem with a monotone objective function and a monotone cost constraint, which relaxes the submodular property of previous studies. We first show that the approximation ratio of the generalized greedy algorithm is α2 (1 − 1 eα ) (where α is the submodularity ratio); and then propose POMC, an anytime randomized iterative approach that can utilize more ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1312.6820  شماره 

صفحات  -

تاریخ انتشار 2013